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FINAL  REPORT 


Summary  of  Research 


Description: 

Our  research  supported  by  the  ARO/SDI  grant  has  had  several  suc¬ 
cesses  in  its  two  principal  lines  of  investigation:  model-based  object  detec¬ 
tion/tracking/recognition,  and  speech  recognition  via  nonparametric  statisti¬ 
cal  techniques.  A  major  goal  of  our  research  has  been  the  development  of  a 
sound  and  unified  theoretical  basis  for  the  design  of  models  and  algorithms, 
and  for  overcoming  the  underlying  massive  computational  and  combinato¬ 
rial  problems.  The  modelling  and  algorithms  have  been  strongly  influenced 
by,  and  have  been  implemented  on,  real  -  world  applications.  The  parallel 
study  of  object  and  speech  recognition  has  benefited  both  areas.  Our  main 
projects  and  contributions  to  application  and  theory,  may  be  divided  into 
three  groups:  ^ 

1.  Object  Detection  and  Tracking:  We  have  explored  a  statistical 
Bayesian  framework  for  simultaneously  describing  and  tracking  objects,  on 
the  basis  of  image  sequence  frames.  *  The  framework  has  been  successfully 
tested  in  a  highway  traffic  scenario  (See  Images  1  and  2).  It  involves  two 
major  components:  object  models ,  and  spatial  -  temporal  data  models. 

Our  object  models  and  deformable  templates .  Vehicles,  for  example,  are 
represented  by  prototypes ,  but  their  silhouettes  on  the  2-D  image  plane  ex¬ 
hibit  a  great  deal  of  variability  depending  on  the  object’s  distance  and  orien¬ 
tation  relative  to  the  camera.  These  variabilities  are  articulated  via  a  prior 
distribution  on  the  “shape  space”.  The  spatial  -  temporal  data  models  and 
designed  using  three  (or  more)  consecutive  frames  at  a  time.  To  deal  with 
the  variability  of  the  observed  grey  -  levels  due  to  variations  in  heightening, 
contrast,  texture,  and  other  effects,  we  employe  nonparametric  statistics  such 
as  rank  tests  and  the  Kolmogorov  -  Simirnov  statistic.  In  addition  to  the 
random  variation  of  “shape”  and  image  data,  in  the  highway  .problem  there 
is  a  third  variability:  the  number  of  vehicles  in  a  given  frame  is  unknown, 
and  it  may  vary  from  frame  to  frame.  This  is  treated  by  using  a  Poisson  type 
process. 

2.  Object  Recognition:  We  have  completed  an  Xlib  -  based,  graphic  inter¬ 
face  computer  program  for  recognizing  2  D  objects  in  environments  highly 
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degraded  by  noise,  blur,  clutter,  and  occlusion.  The  algorithm  has  been 
tested  an  a  small  database  of  2-D  industrial  tools  such  as  pliers,  hammers, 
screws,  etc.;  the  results  have  been  encouraging. 

The  recognition  framework  emphases  object  representation,  data  models , 
and  algorithmic  issues.  These  are  briefly  as  follows: 

(1)  The  object  representation  is  based  on  Stochastic  Hierarchical  Mod¬ 
els  (SHM)  which  are  variants  of  Stochastic  Context  -  free  -Grammars  (in 
the  Chomsky  hierarchy  of  grammars).  Our  SHM7s  have  two  levels  of  hierar¬ 
chy  and  syntax.  The  first  level  (top  level)  views  an  object  as  a  concatena¬ 
tion  of  its  articulated  joints  and  parts;  it  is  represented  by  a  directed  graph 
structure — called  the  membership  graph.  Each  node  in  the  membership  graph 
is  associated  with  a  “high-level”  primitive  (i.e.  a  component  part)  that  may 
be  common  to  several  objects;  the  arcs  of  the  membership  graph  correspond 
to  syntactic  constraints  relating  the  various  parts — the  constraints  are  ei¬ 
ther  topological  (qualitative)  or  geometric.  The  second  level  of  the  hierarchy 
serves  to  represent  the  boundaries  of  the  high-level  primitives  by  a  cascade 
of  “lower-level”  primitives  or  units  starting  with  local  edges  (“edgelets”) 
which  are  concatenated  to  give  small  line  segments  (“linelets”)  which,  in 
turn,  are  concatenated  to  give  more  global  boundaries  or  surfaces.  The  en¬ 
tire  concatenation  process  is  represented  by  a  Markov  process  with  “jumps”, 
which  allows  one  boundary  segment  to  terminate  and  change  (“jump”)  into 
another  boundary  segment.  (2)  The  lower-level  elementary  units  interact 
directly  with  basic  local  description  of  the  grey-level  image  data.  The  local 
data  description  are  properly  designed  nonparametric  statistics,  i.e.  local 
functions  of  data  that  are  invariant  under  changes  in  imaging  conditions  and 
degradations.  (3)  The  combination  of  SHM  with  the  data  models  leads  to 
a  formulation  of  the  recognition  problem  as  a  global  optimization  problem 

which,  in  view  of  the  recursive-  structure^  lends  itself  to  variations  oi  dy - 

namic  programming.  The  dynamic  programming  process  involves  a  large 
state  space,  and  it  requires  the  maintenance  of  a  multitude  of  intermediate 
data  structures.  This  prohibits  the  possibility  of  exact  computations,  and 
hence  efficient  pruning  procedures  ' are  required.  We  have  developed  vari¬ 
ous  optimal  and  sub-optimal  heuristics  for  pruning,  using  a  multiresolution 
analysis  procedure.  The  overall  recognition  algorithm  leads  to  a  simultane¬ 
ous  interpretation  at  multiple  levels  (“low-level”  primitives  and  “high-level” 
complex  entities);  no  primitive  at  any  level  is  determined  until  the  entire 
computation  is  completed. 
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3.  Speech  Recognition:  We  have  developed  a  new  acoustic  model  for 
speech  recognition  alternative  to  that  is  the  HMM  approach;  it  explores 
three  basic  tools:  A  wavelet  representation  of  the  raw  signal,  and  its  induced 
”  waveletogram” ;  nonparametric  transformations  of  the  waveletogram;  and 
1-D  Markov  Random  Field  (MRF)  models  (analogous  to  Markov  models 
for  phonemes  in  the  HMM  approach).  Most  speech  recognition  procedures 
(including  HMM)  assume  that  short  time  segments  of  the  acoustic  signal  are 
stationary  and  linear.  Hence,  the  signal  is  analyzed  via  Fourier  Transform 
(FT),  and  linear  models  such  as  Linear  Predictive  Coding  (LPC).  These 
procedures  are  adequate  in  some  parts  of  the  signal  (e.g.  study  states  of 
vowels),  but  not  in  other  parts:  N onstationarities  in  burst  and  transition 
regions  (e.g.  consonant  to  vowel)  make  the  application  of  FT  questionable; 
and  nonlinearities  contain  important  information  that  cannot  be  captured  by 
LPC.  The  former  of  these  difficulties  (nonstationarity)  is  alleviated  by  using 
wavelets,  while  the  later  points  to  nonparametric  statistics.  The  output  of 
the  nonparametric  transformations  may  be  viewed  as  a  (compressed)  process 
which  is  modeled  by  appropriate  1-D  MRF’s.  Our  procedure  has  also  been 
applied  to  an  important  linguistic  task:  The  classification  of  the  six  stop 
consonants  |p,  t,  k,  b,  d,  g|  on  the  ba;sis  of  CV  (Consonant  -  Vowel)  or  VC 
syllables.  Our  procedure  yields  interesting  2-D  clustering  plots  for  vowels 
and  consonants.  We  know  of  no  other  method  in  the  interactive  that  gives 
such  scatterplots  for  stop  consonants. 
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LIST  OF  IMAGES 


Figure  1.  Detecting  and  tracking  the  fastest  moving  vehicle:  (a)  frame  two,  (b) 
frame  four,  (c)  frame  six,  (d)  frame  eight.  The  small  rectangle  in  (a)  is 
the  initial  configuration  in  the  Metropolis  algorithm. 

Figure  2.  Detecting  and  tracking  of  vehicles  moving  away  from  camera  which  is 
being  fixed  on  a  bridge:  Panels  (a)  -  (c)  show  frames  two,  four,  six, 
and  eight,  respectively.  The  three  small  squares  in  panel  (a)  are  initial 
configurations  in  the  Metropolis  algorithm. 
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ngle  object  segmentation  and  tracking  (times  2,  4,  6  and  8) 
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Figure  1 
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Three  object  segmentation  and  tracking  (times  2,  4,  6  and  8) 
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